Kullback-Leibler distance as a measure of the information filtered from multivariate 

data 
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We show that the Kullback-Leibler distance is a good measure of the statistical uncertainty of 
correlation matrices estimated by using a finite set of data. For correlation matrices of multivariate 
Gaussian variables we analytically determine the expected values of the Kullback-Leibler distance of 
a sample correlation matrix from a reference model and we show that the expected values are known 
also when the specific model is unknown. We propose to make use of the Kullback-Leibler distance 
to estimate the information extracted from a correlation matrix by correlation filtering procedures. 
We also show how to use this distance to measure the stability of filtering procedures with respect 
to statistical uncertainty. We explain the effectiveness of our method by comparing four filtering 
procedures, two of them being based on spectral analysis and the other two on hierarchical clustering. 
We compare these techniques as applied both to simulations of factor models and empirical data. 
We investigate the ability of these filtering procedures in recovering the correlation matrix of models 
from simulations. We discuss such an ability in terms of both the heterogeneity of model parameters 
and the length of data series. We also show that the two spectral techniques are typically more 
informative about the sample correlation matrix than techniques based on hierarchical clustering, 
whereas the latter are more stable with respect to statistical uncertainty. 

PACS numbers: 02.50.Sk, 05.45.Tp, 05.40.Ca, 02.10.Yn, 89.65.Gh 



I. INTRODUCTION 

The empirical analysis of interactions between the ele- 
ments of a complex system is fundamental to understand 
both the collective structures and the basic rules inducing 
the emergent behavior of complex systems. The monitor- 
ing of several complex systems nowadays produces large 
sets of multivariate data. Examples of these sets of data 
are present in physical P, 0] , biological [1, 0, Q and eco- 
nomic systems @, 0; 9 and their analysis is an important 
and challenging task in the investigation of complex sys- 
tems. Many efforts have been done in the analysis of 
multivariate data series and most of them focus on the 
study of pair cross-correlations. The analysis of cross- 
correlation is precious in order to elicit the emergence 
of collective structures from multivariate data. Classical 
spectral methods such as the principal component anal- 
ysis 0, recent related techniques based on concepts of 
random matrix theory d, 0| , hierarchical clustering [l^] , 
factor analysis ^ and graph theory [TT'| are fruitful ap- 
proaches to the analysis of correlations among elements 
of complex systems elicited by multivariate data. 

Cross-correlations estimated from real data are un- 
avoidably affected by the statistical uncertainty due to 
the finite size of the sample. In most cases, the length 
of data is unavoidably limited whereas in other cases the 
length of data needs to be limited to avoid that sizable 
non-stationary effects might introduce large errors in the 
estimation of correlations. Cross-correlations might also 
be affected by noise due to measurement errors and to 
the interaction of the system with the environment. In 
order to at least partially overcome these problems, it is 



advisable to select statistically reliable information from 
the correlation matrix. We address the selection of the 
most statistically reliable part of the correlation matrix 
with the locution filtering of the correlation matrix. 

Several techniques have been proposed in the literature 
in order to filter out information from the correlation 
matrix and therefore it is important to have at hand a 
method for comparing the performance of such different 
techniques in a quantitative way. 

In this paper, we propose to measure the performance 
of filtering procedures by using the Kullback-Leibler dis- 
tance [12] which is a measure of distance between prob- 
ability distributions and it is widely used in information 
theory (see for instance [11]). Specifically, for multivari- 
ate Gaussian variables, we explicitly compute the analyt- 
ical form of the Kullback-Leibler distance and we show 
how it depends on the correlation matrices of the con- 
sidered sets of data or of filtered versions of them. Un- 
der the same assumptions we analytically obtain the ex- 
pected values of the Kullback-Leibler distance between 
the correlation matrix of a multivariate model and a sam- 
ple correlation matrix obtained with the Pearson estima- 
tor from a finite set of data. One of our key results is 
that these expected values are model independent. This 
result shows that the Kullback-Leibler distance is very 
good in quantifying the amount of information present 
in a sample correlation matrix with respect to an hypo- 
thetical reference model also in the cases when the spe- 
cific nature of the model is unknown. We are also able 
to compute the expected value of the Kullback-Leibler 
distance between two distinct samples of the correlation 
matrix obtained from the same random source. This last 
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quantity is very useful in quantifying the stability asso- 
ciated with any sample estimation and specifically with 
the stability of the correlation matrices obtained from 
filtering procedures. 

We show the effectiveness of the use of the Kullback- 
Leibler distance in comparing data and models and in 
assessing the stability of the estimation of the sample 
correlation matrix by investigating four different filter- 
ing methods. Two of them are based on spectral anal- 
ysis, while the other two are generated by hierarchical 
clustering procedures. A good filtered correlation matrix 
is supposed to be informative about the sample corre- 
lation matrix and, at the same time, to be statistically 
more robust than the sample matrix itself with respect to 
statistical uncertainty. In our investigation we consider 
in a quantitative way both these aspects. 

The paper is organized as follows. In section |lT] we 
present the analytical results of the expected values 
of the KuUback-Leibler distance and we show how the 
KuUback-Leibler distance can be used as an estimator of 
the goodness of filtering procedures. In section HIIl we de- 
scribe the four filtering procedures that we quantitatively 
compare in section ITVl both by investigating simulations 
and real data. Finally, in section |V] we draw our conclu- 
sions. 



Leibler distance is asymmetric. In Eq.JT]) the expec- 
tation value is evaluated according to the distribution 
p. Since the property of symmetry is sometimes impor- 
tant a symmetrization of the KuUback-Leibler distance, 
called Jefferys-Kullback-Leibler J-divergence has been in- 
troduced [13, III other cases the asymmetry could 
also be an useful feature of a distance measure. This 
is the case when objects of different nature (or simply 
with different statistical meaning) are compared. The 
KuUback-Leibler distance is widely used in information 
theory. The mutual information between two random 
variables X and Y is defined as K{p{X,Y),p{X)p{Y)) 
(see for instance [11]), where p{X,Y) is the joint prob- 
ability density function of X and Y, whereas p{X) and 
p{Y) are the corresponding marginal probabilities. In 
this case, the asymmetry is important because the mu- 
tual information is measuring the error one commits in 
considering two random variables as independent vari- 
ables. Accordingly, this measure is performed by evalu- 
ating the distance between the correct joint probability 
p{X,Y) and the product p{X)p{Y), averaging the result 
over p{X, Y). 



II. KULLBACK-LEIBLER DISTANCE 

The KuUback-Leibler distance (see for instance yLSj ) or 
mutual entropy is a measure of the distance between two 
probability densities, say p and q, which is defined as 



K{p,q)^Ep 



log 



(1) 



where Ep[.] indicates the expectation value with re- 
spect to the probability density p. The KuUback- 



Here we consider the KuUback-Leibler distance be- 
tween multivariate Gaussian random variables. We con- 
sider variables with zero mean and unit variance without 
loss of generality because we are interested in the compar- 
ison of the correlation matrices of the two set of variables. 
In this case, the Gaussian multivariate distribution asso- 
ciated with the random vector X is completely defined 
by the correlation matrix S of the system. In the fol- 
lowing we indicate the probability density function with 
P(S,A"). Given two different probability density func- 
tions P(Si,A:) and P(S2,A:), we have 



J 



if(F(Si, X), P(S2, X)) = i?p(s,,x) 



log 



P(Si,X) 

P(S2,X) 
I 



P(Si,X)log 



P(S2,X) 



dX, 



(2) 



By performing the integral in Eq. ([2]) one obtains: 



i^(P(Si,X),P(S2,X))-i 



log 



+tr(S2''Si) -n] , (3) 

where n is the dimension of the space spanned by the 
X variable and |S| indicates the determinant of S. In 
Appendix A we show how to derive the last equation 
from Eq. Eq. ^ shows that the KuUback- 

Leibler distance is an explicit function of only the cor- 
relation matrices Si and S2 for multivariate normal 



distributions. Therefore, from now on we indicate 
A'(P(Si,A:),P(S2,A:)) simply with A'(Si,S2). It is 
worth noting that the KuUback-Leibler distance takes 
naturally into account the statistical nature of correlation 
matrices. Indeed Ar(Si, S2) is well defined only provided 
that the matrices Si and S2 are positive definite. This 
property is not common to other measures of distance 
between matrices which are based essentially on the iso- 
morphism between the matrix space and a vector space, 
e.g. the Frobenius distance (see below). However this 
property can also be a limitation. The KuUback-Leibler 



3 



distance cannot be used to quantify the distance between 
semi-positive correlation matrices that are observed when 
the length T of data series is smaller than the number n 
of elements of the system. The Kullback-Leibler distance 
is also related to the Maximum Likelihood Factor Analy- 
sis (MLFA) 0. In fact, the log-likelihood function to be 
maximized in order to describe a system of n elements 
with sample correlation matrix C estimated from data 
series of length T, with a certain A;-factor model with 
correlation matrix Sk is given by: 

i(C,Sk) =-TA'(C,Sk)-ir[log(|27rC|)-,i]. (4) 

In the MLFA, L{C, Sk) is maximized with respect to Sk- 
This maximization is therefore equivalent to minimize 
the Kullback-Leibler distance K{C, Sk) with respect to 
Sk, because the other terms in Eq. ^ are independent 
ofSk. It is to notice that in Eq. ([3]) the empirical correla- 
tion matrix C is the one estimated from the investigated 
data and one calibrates the correlation matrix Sk of the 
model by maximizing L(C, Sk)- This fact explains why 
the log-likelihood is depending on A'(C.Sk) instead of 
if(Sk,C). 

In this paper we want to apply the Kullback-Leibler 
distance to sample correlation matrices obtained with 
the Pearson estimator. Since different realizations of the 
process give rise to different samples, a Kullback-Leibler 
distance having one or two sample correlation matrices as 
arguments is a function of one or two random matrices. 
It is known that sample covariance matrices of finite vari- 
ance variables belong to the ensemble of Wishart random 
matrices and many statistical properties of Wishart ma- 
trices are known It is therefore useful to investigate 
the statistical properties of Kullback-Leibler distance in- 
volving sample correlation matrices of multivariate Gaus- 
sian random variables. These properties will be useful in 
the next section as absolute terms of comparison of fil- 
tering procedures of the correlation matrix. 

Let us consider a multinormally distributed random 
vector X of dimension n with correlation matrix S. Let 
Ci and C2 be two sample correlation matrices obtained 
from two independent realizations of the system both of 
length T. By making use of the theory of Wishart ma- 
trices M we obtain that 



i?[i^(S,Ci)] = i|nlog('| 



and 



T 

E 

p=T-n+l 



r'b/2) 



r(p/2) 



n{n + 1) 
T-n-1 



(5) 



T 

E 



r'b/2) 



r(p/2) 



E[K{Ci,C2)] 



1 n{n + 1) 
2r-n- 1' 



(7) 



(6) 



where T(x) is the usual Gamma function and r'(a;) is the 
derivative of r(a;). In Appendix B we show how to derive 
these expectation values. Finally, it is possible to give the 
asymptotic expectation value of the standard deviation 
of K{Ci, S) by using the Bartlett statistics [15|. Specif- 
ically if T > 1, n > 1 and Q = T/n > 1 we infer that 
the standard deviation of i^r(Ci,S) is ax — 1/(2(5). 
It is important to observe that all the expectation values 
given in Eq.s (l5][7]) are independent of S, i.e. they are 
independent of the specific model. This fact implies that 
(i) the Kullback-Leibler distance is a good measure of the 
statistical uncertainty of correlation matrix which is due 
to the finite length of data series and (ii) the expected 
value of the Kullback-Leibler distance is known also when 
the underlying model hypothesized to describe the sys- 
tem is unknown. This fact has important consequences. 
Suppose one knows that the observed data are well ap- 
proximated by a multivariate Gaussian distribution and 
that one measures a sample correlation matrix C. In 
order to remove some unavoidably present statistical un- 
certainty, the experimenter applies a filtering procedure 
to the data obtaining the filtered correlation matrix C**'* . 
If the filtering technique is able to recover the model cor- 
relation matrix, i.e. C**'* — S, the Kullback-Leibler dis- 
tance K{C, C*"*) must be equal on average to the value 
given in Eq. ([6|). This expected value is independent on 
the (unknown) model correlation matrix S. Therefore 
large deviations from this expectation value indicate that 
the filtered matrix is not consistent with the true matrix 
of the system. If K{C, C*"*) is significantly smaller (in 
terms of the error ax — 1/(2Q)) than the expectation 
value of Eq. ([6]), it means that the filtering procedure 
has at most partially removed the statistical uncertainty, 
i.e. the filtered matrix is keeping some of the statistical 
uncertainty due to the finite length T. If, on the other 
hand, K{C, C**'*) is significantly larger than the value of 
Eq. it means that the filtered matrix is either filter- 
ing too much information or distorting the signal. The 
distance between K{C,C^^*^) and the expected value of 
Eq. ([6]) is a measure of the goodness of the filtering pro- 
cedure in keeping the maximal amount of information 
which can be present in sample correlation matrices es- 
timated with a finite number of records. 

A second aspect concerns the stability of the filtered 
correlation matrix obtained from a sample matrix. Let us 
suppose to apply a certain filtering procedure to the cor- 
relation matrices Ci and C2 of two independent realiza- 
tions of the system, obtaining two filtered correlation ma- 
trices Cf * and C|'*. If it turns out that K{Cf\ Cf'*) 
is larger than the expected value of K(Ci, C2) described 
by Eq. ([7]) , one can conclude that the filtering procedure 
produces correlation matrices less reproducible than the 
sample correlation matrices and therefore the procedure 
is not suitable for the purpose of filtering robust infor- 
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mation from the empirical correlation matrices Ci and 
C2. 

In summary we have shown that the KuUback-Leibler 
distance is very good for comparing correlation matrices 
because (i) it is an asymmetric distance and therefore it 
can distinguish between quantities observed in real sys- 
tems and used to model the empirical observations, e.g. 
the sample correlation matrix and the filtered correla- 
tion matrix respectively; (ii) the expectation values of the 
KuUback-Leibler distance given in Eq.s ^Eil are model 
independent, indicating that this distance is a good es- 
timator of the statistical uncertainty due to the finite 
size of the empirical sample; (iii) the KuUback-Leibler 
distance is intimately related to the log-likelihood func- 
tion used in MLFA and (iv) it is deeply related with 
concepts of information theory, such as the the mutual 
information. These properties are not observed in other 
widespread distances between matrices. For example, we 
shall show that we do not find these properties in the 
Frobenius distance, which is a standard measure of the 
distance between matrices. 

The Frobenius distance between two n x n matrices 
Si and S2, of real elements sj^ and sf^ respectively, is 
defined as 



-F(Si,S2) 



tr (Si-S2)(Si-S2 



(8) 



We note that the Frobenius distance is symmetric. 
Therefore it cannot assign a different role to a model 
correlation matrix S with respect to some sample C of 
S. We also observe that this distance is well defined in- 
dependently of the statistical nature of matrices Si and 
S2, i.e. these matrices can also be non positive definite. 
Finally and more important, we want to show, for a sim- 
ple system of two variables, that the expectation value of 
the Frobenius distance between a true correlation matrix 
and its Pearson estimator is model dependent, i.e. this 
expectation value depends on the true correlation matrix. 

Let us consider a bivariate normal distribution 
A''(0, S), where S is a 2 x 2 correlation matrix and 
is the null vector of dimension 2. We indicate the only 
entry of S different from 1 with p. The sample correla- 
tion matrix C is defined as 



(9) 



where p is the Pearson correlation coefficient estimated 
from a realization of A^(0, S) of length T. It results that 




F(S,C) = V2 



P- P\ 



(10) 



The distribution of p is approximately Gaussian for large 
values of T . The mean value of p is p and the standard 



deviation is (1 — p'^)/\/T [l^. Accordingly, the expec- 
tation value of the Frobenius distance between the two 
matrices is: 



i?[F(S,C)] = 



(11) 



This result shows that the Frobenius distance is model 
dependent and therefore it is not a good estimator of the 
statistical uncertainty of correlation matrix due to the 
finite length of data series. 



III. FILTERING PROCEDURES 

In this section we describe four procedures that can be 
used to filter correlation matrices. Two procedures are 
based on spectral techniques, i.e. they are based on the 
comparison between the spectrum of the sample corre- 
lation matrix and the spectrum expected for a random 
matrix. These procedures are described in some detail in 
subsection nil Al The other two techniques that we con- 
sider here are hierarchical clustering procedures. Specif- 
ically, we obtain two different filtered matrices by apply- 
ing the Single Linkage Cluster Analysis (SLCA) and the 
Average Linkage Cluster Analysis (ALCA) to the sam- 
ple correlation matrix of the system. The ALCA and 
SLCA are standard procedures of hierarchical clustering 
and we describe how these techniques generate filtered 
correlation matrices in subsection IIII Bl 



A. Spectral methods 

Random matrix theory (l7| was originally developed in 
nuclear physics and then applied to many different fields. 
Let us consider n independent random variables with fi- 
nite variance and T records each. The sample correlation 
matrix of the system in the limit T ^ cxd is simply the 
identity matrix. When T is finite the correlation ma- 
trix will in general be different from the identity matrix. 
Random matrix theory allows to prove that in the limit 
T, n — > 00, with a fixed ratio Q = T /n > 1, the eigenval- 
ues of the sample correlation matrix C cannot be larger 
than 



An 



= a2(l + l/0 + 2yT7Q), 



(12) 



where ~ 1 for correlation matrices. The idea underly- 
ing both the spectral filtering procedures considered here 
is that of reducing the impact of eigenvalues smaller than 
A-maa; On the Structure of an empirical correlation matrix, 
in order to remove the effects of those eigenvalues that 
are consistent with the null hypothesis of uncorrelated 
random variables. In some practical cases, such as for 
example in finance, one finds that the largest eigenvalue 
Al of the empirical correlation matrix is definitely incon- 
sistent with random matrix theory. In these cases, the 
null hypothesis is modified so that correlations can be 
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explained in terms of a one factor model. Accordingly, 
when Ai >> Xmax we set cr^ = 1 — Ai/n in Eq. [6|]. 
The first filtering procedure we consider here has been 
used by Rosenow et al. in Ref. flS*!. The technique con- 
sists in replacing the eigenvalues smaller than Xmax iu 
the diagonal matrix D of eigenvalues of C with O's, thus 
obtaining a new diagonal matrix Dg. One can there- 
fore compute the matrix Qs = Dg V of elements 
qfj, where V is the matrix of eigenvectors of C. Fi- 
nally, the filtered correlation matrix of elements c^- 
is obtained by forcing the diagonal elements of Qs to 



1 



5,. 



qfj{l — Sij), where Sij is the stan- 



dard Kronecker symbol. The second procedure we ac 
ply has been considered by Potters et al. in Ref. [19|. 
Here, eigenvalues smaller than \max in D are replaced 
with their average value in the diagonal matrix Dg. As 
in the previous case, one rotates the matrix Dg getting 
the matrix Qb = DgV of elements g,^, where again 
V is the matrix of eigenvectors of C. Finally, the fil- 
tered correl ation matrix is the matrix of elements 
cfj = q^j/^q^qfj ■ Both the matrices and sat- 
isfy the properties of a correlation matrix, i.e. (i) they 
are positive definite; (ii) their diagonal elements are equal 
to 1 and (iii) their off-diagonal elements are in absolute 
value smaller or equal to 1. 



B. Hierarchical Clustering Procedures 

Another approach used to filter the information associ- 
ated with the correlation matrix is given by hierarchical 
clustering analysis Let us consider a set of n ob- 

jects and suppose that a similarity measure, e.g. the cor- 
relation coefficient, between pairs of elements is defined. 
Similarity measures can be written in a n x similarity 
matrix. The hierarchical clustering methods allow to hi- 
erarchically organize the elements in clusters. A result 
of the procedure is a rooted tree or dendrogram giving 
a quantitative description of the clusters thus obtained. 
Another result of the procedure is a filtered correlation 
matrix. Indeed the whole information about the rooted 
tree can be stored in a n x n matrix C"^ [T^. We have 
recently shown [2flj that, when the entries of are non 
negative numbers, this matrix is the correlation matrix 
of a suitable factor model, that we have named Hierarchi- 
cally Nested Factor Model (HNFM). This result ensures 
that, under the condition of non negative entries of 
(typically satisfied in many empirical applications), this 
matrix is a true correlation matrix, i.e. it is positive def- 
inite. 

A large number of hierarchical clustering procedures 
can be found in the literature. For a review about the 
classical techniques see for instance Ref. In this 

paper we focus our attention on the SLCA and the 
ALGA. 

The starting point of both the procedures is the em- 



pirical correlation matrix C. The following procedure 
performs the ALGA giving as an output a rooted tree 
and a filtered correlation matrix C^lca ^f elements cf^ : 

1. Set B = C. 

2. Select the maximum correlation hhk in the corre- 
lation matrix B. Note that after the first step of 
construction h and k can be simple elements (i.e. 
clusters of one element each) or clusters (sets of el- 
ements), y i ^ h and W j € k one sets the elements 



cf, of the matrix C^lca 



3. Merge cluster h and cluster k into a single cluster, 
say q. The merging operation identifies a node in 
the rooted tree connecting clusters h and k at the 
correlation b^k- 



4. Redefine the matrix B: 
_ nh bhj + rik bkj 



if j ^ h and j ^ k 



otherwise, 



where rih and Uk are the number of elements be- 
longing respectively to the cluster h and to the clus- 
ter k before the merging operation. Note that if the 
dimension of B is m x m then the dimension of the 
redefined B is (m — 1) x (m — 1) because of the 
merging of clusters h and k into the cluster q. 

5. If the dimension of B is larger than 1 then go to 
step 2, else Stop. 

By replacing point 4 of the above algorithm with the 
following item 



4. Redefine the matrix B: 



bqj = Max [bhj,bkj] 
b,j = bij 



ii j ^ h and j ^ k 
otherwise, 



one obtains an algorithm performing the SLGA and the 
associated filtered correlation matrix Cg^f;.^. In the fol- 
lowing, we indicate the matrices Cqlca ^^'^ ^alca 
with C^'^*-'^ and c^'^*-'"^, respectively, in order to sim- 
plify the notation. 



IV. COMPARISON OF FILTERING 
PROCEDURES 

We have applied the four filtering procedures described 
in the previous section to both real and artificial systems. 
We have considered the real system of daily returns of the 
100 most capitalized stocks traded at New York Stock 
Exchange (NYSE) in the time period from January 2001 
to December 2003. In this case, the length of the n = 100 
time series is T = 748 records. We have also considered 
the system of daily returns of 92 highly capitalized stocks 
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traded at London Stock Exchange in 2002. The length 
of the n — 92 time series is T = 250 for this system. We 
have also applied the filtering procedures to two artificial 
systems of n = 100 elements each. Both these systems 
are described by a factor model A factor model is 
a mathematical model which describes the correlation 
among a set of elements that we indicate with Xi {i — 
1, n), in terms of a certain number of common factors 
fk (k = 1, P). The linear dependence of elements from 
factors is mathematically expressed as 

p 

^i(^) = X! ^ikfk{t) + Vi<^i{t), (13) 
fc=l 

where i e {1, n}, ry, = [1 - J2k=i The k"" fac- 

tor fk (t) and e,; {t) are independent identically distributed 
random variables with zero mean and unit variance. In 
our simulations, the factors fk{t) {k — 1,...,P) and the 
idiosyncratic noises ei{t) (i — 1, ...,n) are Gaussian ran- 
dom variables. 

In the first artificial system that we consider here, el- 
ements are grouped in P = 12 orthogonal clusters. In 
terms of factor models, this orthogonal grouping of el- 
ements is expressed by the fact that elements belong- 
ing to different clusters depend on different (indepen- 
dent) factors, i.e. if Xi belongs to the group k then 
Xi{t) = jik fkit) +'riiei(t) . The dimension of groups is het- 
erogeneous to mimic typical conditions observed in some 
real systems. Specifically the number of elements belong- 
ing to each group ranges from a minimum of 3 elements 
to a maximum of 17. The other artificial system that we 
have considered is described by a HNFM with P = 23 fac- 
tors. This empirically based model has been introduced 
in Ref. We have chosen these two models because 

they are conceptually very different one from the other. 
In fact, in the HNFM elements cannot be straightfor- 
wardly divided in groups because they depend on factors 
in a nested hierarchical way whereas in the other model 
the groups of elements are clearly distinguished because 
elements belonging to different groups depend on differ- 
ent and mutually independent factors. Roughly speak- 
ing we can say that the block diagonal model describes 
a "separable" system whereas the HNFM represents a 
"nested" system. In a first analysis, both the considered 
factor models are degenerate models, i.e. the coefhcient 
7ifc , which expresses the dependence of the element i on 
the factor k in the model of Eq. p3|) , is only depending 
on the factor and not on the element. It is to notice that 
by applying either the ALGA or the SLGA to the corre- 
lation matrix of the two considered models one obtains 
back the correlation matrix of the models. This fact is 
due to the degeneracy of the models and it gives a cer- 
tain advantage to hierarchical clustering procedures with 
respect to spectral techniques in reconstructing the true 
correlation matrix of these systems. In fact both the con- 
sidered spectral techniques cannot reconstruct the true 
correlation matrix S of the system when applied to S 
itself. This is the first reason why we have decided to 



perform other simulations of the systems by removing 
the degeneracy from models. The second reason is that 
the true correlation matrix of the system is in general 
unknown for real data: we have only one correlation ma- 
trix obtained from a single realization of the system with 
finite time series length T. Accordingly, we have decided 
to perform one single realization, say Xx^ , with length 
of data series of each model and we have assumed that 
the correlation matrix Cxa of this single realization of 
each model represents the true correlation matrix of the 
corresponding system. This approach removes the de- 
generacy of the 7-parameters of models and at the same 
time allows to treat models in a way more similar to the 
one used for real data. In order to test the stability of fil- 
tering procedures with respect to statistical uncertainty 
(as discussed in subsection IIVBI) , we have constructed 
bootstrap replicas of the single realization Xx^ of each 
model. The bootstrap approach has the advantage that 
it does not require to make assumptions about the data 
distribution ^. 

We have simulated 1000 independent sets of data for 
the artificial systems described by the degenerate models 
and we have constructed 1000 bootstrap replicas [l^, 
of the empirical data. We have also considered 1000 boot- 
strap replicas of the single realization with series length 
Td of both the artificial systems, in order to treat the 
models more similarly to real data. We have applied all 
the filtering procedures described above to the correla- 
tion matrix Ci of each simulation or replica i of the ar- 
tificial systems and to each replica i of the real systems. 
Therefore, we have obtained four filtered correlation ma- 
trices that we indicate with Cf* associated with each 
realization or replica i of the systems. The label filt in 
Cf '* stands for ALGA, SLGA, B and S depending on the 
filtering procedure. 



A. Information about the model 

The first question we want to ask is which filtering 
procedure performs better in detecting the correlation 
matrix of the model. We can ask this question only for 
the simulations where we know the model correlation ma- 
trix used to generate the data. In order to evaluate the 
ability of filtering procedures in reconstructing the cor- 
relation matrix of the model S, we have evaluated the 



We have also used the Cholesky decomposition of Cx^ instead of 
the bootstrap approach, in order to obtain different realizations 
of the non degenerate systems. The Cholesky decomposition ap- 
proach [2l| | allows to construct mutually independent realizations 
of the system. However results obtained with the Cholesky de- 
composition are in complete agreement with results obtained by 
using the bootstrap technique that we report in the paper. It 
is also to notice that by using the Cholesky decomposition to 
perform simulations it is necessary to know the data distribution 
(e.g. Gaussian or Student-t), whereas the bootstrap approach 
does not require to make assumptions about such distribution. 
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average KuUback-Lciblcr distance {K{Y:, C?'*)) between 
the correlation matrix of the model and the correlation 
matrix filtered from the samples. Averages have been 
performed over 1000 realizations of the models. The 
smaller (_ftr(5],Cf'*)) the larger is the amount of infor- 
mation about the model that is detected by the filtered 
matrix. In Tables I and II we distinguish between de- 
generate models that we indicate with "Block diagonal" 
and "HNFM" and non degenerate models that we indi- 
cate with "Block diagonal (n.d.)" and "HNFM (n.d.)". 
In Table I we report results obtained for all the consid- 
ered models when the length of simulated normally dis- 
tributed time series is T = 748. In the table, we observe 
that the ALCA outperforms all the other filtering pro- 
cedures both for degenerate and non degenerate models. 
It is also to notice that the performance of SLCA is bet- 
ter than both the spectral filtering procedures for all the 
models with the exception of the non degenerate block 
diagonal model. Such a good performance of hierarchi- 
cal clustering filtering procedures was expected for the 
degenerate models. Indeed, as we have discussed above, 
such models give a certain advantage to hierarchical clus- 
tering filtering procedures because of the degeneracy of 
coefficients. The fact that ALCA outperforms all the 
other filtering procedures also in the case of non degener- 
ate models can be explained by taking into account both 
the length of data series and the way in which model de- 
generacy has been removed. The correlation matrix of 
the non degenerate models is by construction the corre- 
lation matrix of a single realization of the corresponding 
degenerate models with scries length Td = 748. This fact 
implies that the dispersion of the non degenerate corre- 
lations from the corresponding values in the deg enerate 
model is of the order of D„ = l/V^d = 1/V748. In Ta- 
ble I, the length of simulated data series is also T = 748, 
i.e. T = Td- This fact implies that the statistical uncer- 
tainty associated with the sample correlations is of the 
order 1/Vt — l/-\/748. This value is equal to Dm, im- 
plying that for series length T = 748 the non degeneracy 
of model parameters is of the same order of the statis- 
tical uncertainty. In other words, details about specific 
correlation values cannot be distinguished from statisti- 
cal uncertainty for such short data series. Only the global 
structure of the correlation matrix is important and hier- 
archical clustering procedures results to be more capable 
than spectral techniques in reconstructing the correlation 
structure of the models. In order to better understand 



the effect of the non degeneracy of model parameters on 
the ability of filtering procedures in reconstructing the 
model, we consider also a case with time series of length 
longer than in the prevoius case. Specifically, in Table 
II we report results obtained for time series of length 
T = 7480, which is ten times the length considered in 
Table I. In the case of T = 7480, we continue to observe 
a better performance of hierarchical clustering filtering 
procedures and in particular of ALCA with respect to 
spectral techniques for the degenerate models. This fact 
was expected because of the degeneracy of the models. 
However, in Table II we observe that the spectral tech- 
nique producing as result of the filtering outperforms 
hierarchical clustering procedures for the non degenerate 
models. The method producing provides a result 
which is of the same order than Q-^^^iA £qj. block 
diagonal (n.d.) model whereas still underperform with 
respect to both hierarchical clustering procedures for the 
HNFM (n.d.). The success of can be explained by 
the fact that for T = 7480 the statistical uncertainty 
of sample correlations is of the order 1/VT = l/v'7480 
which is smaller than Therefore, for T = 7480 the 
non degeneracy of models becomes relevant as compared 
with the statistical uncertainty affecting sample correla- 
tions and spectral techniques result to be more capable 
than hierarchical clustering in taking into account such 
non degeneracy. This aspect is related to the fact that 
ALCA and SLCA are filtering procedures characterized 
by n — 1 free parameters whereas spectral methods have 
a variable number of free parameters which is scaling as 
when T tends to infinity. 

In summary, we have shown that hierarchical cluster- 
ing procedures better reconstruct the degenerate mod- 
els both for short and long time series, whereas for the 
non degenerate models the length of data series becomes 
relevant in the comparison. Specifically, for short time 
series (T — 748), such that the statistical uncertainty 
of correlations hides the heterogeneity of model parame- 
ters, we have observed that hierarchical clustering proce- 
dures, and in particular the ALCA, outperform spectral 
techniques. On the contrary, for data series long enough 
(T = 7480) that the heterogeneity of model parameters 
is relevant with respect to the statistical uncertainty of 
sample correlations, spectral procedures result typically 
to be more efficient than hierarchical clustering proce- 
dures in reconstructing the correlation matrix of models. 



B. Information about the sample correlation 
matrix and stability 



In this subsection we quantify the amount of infor- 
mation that different filtering procedures preserve when 



applied to sample correlation matrices. This is impor- 
tant in all those real cases when one does not know the 
model correlation matrix. Moreover we investigate the 
stability of the filtered correlation matrices with respect 
to different realizations of the process. We use two quan- 



8 



TABLE I: Average value of the KuUback-Leibler distance between the correlation 
matrix of the model and the correlation matrix filtered from the sample one. For 
each case average and standard deviation are obtained from 1000 realizations or 
bootstrap replicas of the system, (n = 100, T = 748). 



Models 


{A'(S,CAL'=^)) 


(A'(S,CfL'=^)> 


(7f(S,CP)) 


(^(S,Cf)> 


Block diagonal 


0.15 ±0.01 


0.57 ±0.04 


0.84 ± 0.03 


1.50 ±0.05 


HNFM 


0.22 ±0.02 


0.33 ±0.05 


1.99 ±0.07 


2.15 ±0.08 


Block diagonal (n.d.) 


3.56 ±0.02 


4.36 ± 0.07 


3.74 ± 0.06 


4.34 ± 0.09 


HNFM (n.d.) 


3.38 ±0.02 


3.85 ±0.08 


4.54 ± 0.08 


5.0 ±0.1 



TABLE IL The same as in Table I but with T = 7480. 



Models 


{A'(S,CA^'=^)) 


(A'(S,Cf^'=^)> 


(A-(S,CP)) 


{A(S,Cf)) 


Block diagonal 


0.015 ±0.001 


0.105 ±0.006 


0.162 ±0.006 


0.70 ± 0.01 


HNFM 


0.023 ± 0.002 


0.032 ± 0.005 


0.986 ± 0.007 


1.44 ±0.07 


Block diagonal (n.d.) 


3.418 ± 0.004 


3.94 ± 0.02 


2.95 ±0.02 


3.41 ±0.02 


HNFM (n.d.) 


3.174 ±0.008 


3.52 ± 0.02 


2.54 ±0.04 


4.66 ± 0.09 



titles in order to evaluate the performance of the filtering 
procedures. The first quantity that we have measured is 
the KuUback-Leibler distance i4r(Ci,C?'*) between the 
correlation matrix Ci of the i-th sample and the filtered 
correlation matrix Cf '* obtained by applying one of the 
filtering procedure to Ci. ii'(Ci,Cf'*) is a measure of 
the information about Ci that is stored in C?'*: the 
smaller iir(Ci, C?'*), the larger is the amount of infor- 
mation about Ci which is retained in the filtered ma- 
trix. The second quantity that we have considered is the 
KuUback-Leibler distance K{C^^^, C?'*) between two fiL 
tered matrices Cf '* and C?'* obtained by applying the 
same filtering procedure to two different simulations (or 
replicas) i and j of the system. X(C?'*, C?'*) measures 
the statistical robustness of filtered matrices. The smaller 
ii:(Cf'*, C?'*), the greater is the stability of the filtering 
procedure with respect to the statistical uncertainty. In 
our estimations, we have averaged both if (Ci, Cf *) and 
Ar(Cf'*, C?'*) over the 1000 independent realizations or 
replicas of each system. 

In Fig. [Tl we show the results obtained for the block 
diagonal model with degenerate coefficients. In the figure 
we plot (if(Ci,Cf")) versus {K{Cf\Cf^)) for all the 
described filtering procedures. Averages that we indicate 
with the notation (.) are performed over 1000 realizations 
and the series length is T = 748. Error bars are one stan- 
dard deviation. In all the cases presented in this paper 
we have verified that the error interval indicated around 
the mean value of plus and minus one standard devia- 
tion includes approximately the 67% of the realizations 
used to compute the mean value. In the figure we also 
report the result of an hypothetic perfect filtering proce- 
dure, i.e. a filtering techniques which is able to recover 
exactly the model from each realization. In the figure, 
we indicate the corresponding correlation matrix with S. 



Such a filtering is maximally stable, because it recovers 
always the correlation matrix of the block diagonal factor 
model. Accordingly, it is (A'(I], S)) = 0. This perfect 
filtering procedure removes completely the noise due to 
the finite length of data series and therefore the quantity 
(Ar(Ci,S)) 7^ 0. Instead, it is equal to the expectation 
value of Eq. ®, i.e. (A:(Ci,S)) ~ 3.54 for n = 100 
and T = 748. Note that we know the position in the 
plane of the optimal filtering even if we do not know the 
underlying model. This is due to the important char- 
acteristic that the mean value of the KuUback-Leibler 
distance is independent from the model correlation ma- 
trix (at least in the multivariate Gaussian case). In the 
figure, we observe that all the filtering procedures, ex- 
cept the SLCA, retain in average more information about 
the sample correlation matrix than the true model, i.e. 
(A:(Ci,CP*)) < 3.54 for C**" equal to C^^^^^ C^ and 
C^. This fact indicates that these filtering procedures 
do not discard completely the noise present in the sample 
correlation matrix as a consequence of the finite length of 
time series. The SLCA algorithm is the only one which 
is retaining less information about the sample correlation 
matrix than the true model. Moreover the SLCA is more 
stable than all the other filtering procedures. 

In Fig. [21 we show the results obtained by applying the 
considered filtering procedures to the system described 
by the HNFM with P = 23 factors and with degener- 
ate coefficients. In this case, only the ALCA is retaining 
more information about the sample correlation matrix 
than the true model. However it is interesting to note 
that both the spectral techniques are at the same time 
less informative about the sample correlation matrix and 
less stable than both hierarchical clustering filtering pro- 
cedures. In other words, for the degenerate HNFM, hier- 
archical clustering procedures clearly outperform spectral 



9 



increasing stability 



increasing stability 




<K(cf',C.S> 



3 

O 

ft 

5' 

crq 

5' 

S' 

3 
S- 
5' 

3 




2 2.5 

<K(cf",c/'")> 



FIG. 1: Block diagonal model with degenerate coefficients. 
Comparison of the 4 filtered correlation matrices described in 
the text. In the graph we plot the stability of the filtered 
matrix {x axis) against the amount of information about the 
correlation matrix that is retained in the filtered matrix {y 
axis). Small values of (if (Cf *, Cf *)) and {K{Ci, Cf *)) cor- 
respond to large stability and large amount of information 
preserved by the filtering respectively. The analysis is per- 
formed for a system of 100 elements divided in 12 orthogonal 
groups, each one depending on a specific Gaussian factor, i.e. 
a block diagonal model. Averages have been performed over 
1000 independent realizations of the system and error bars 
correspond to one standard deviation. 
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FIG. 2: Hierarchically nested factor model with degenerate 
coefficients. Comparison of the filtered correlation matrices 
produced by the 4 techniques described in the text. The 
analyzed system is composed by 100 elements following the 
HNFM with 23 factors obtained in Ref. [20| ]. Averages have 
been performed over 1000 independent realizations of the sys- 
tem and error bars correspond to one standard deviation. 



FIG. 3: Block diagonal model with non degenerate coeffi- 
cients. Comparison of the 4 filtered correlation matrices de- 
scribed in the text. The analysis is performed for a system of 
100 elements divided in 12 orthogonal groups, each one de- 
pending on a specific Gaussian factor, i.e. a block diagonal 
model. Averages have been performed over 1000 bootstrap 
replicas of a single realization of the system and error bars 
correspond to one standard deviation. 



techniques. This fact is a consequence of the pure hierar- 
chical nature of the HNFM. Indeed in Ref. [23|, we have 
shown that when the hierarchical features of a system 
are prominent with respect to the details of specific cor- 
relation values, the spectral procedures have problems in 
filtering information about the system. Such problems do 
not appear for separable systems, like the block diagonal 
model considered above. 

In summary, for both the considered models we observe 
that hierarchical clustering techniques produce more sta- 
ble filtered correlation matrices than spectral procedures. 
Concerning the information about the sample correlation 
matrix that is stored in the filtering we observe that re- 
sults obtained for hierarchical clustering procedures are 
closer to the perfect filtering (giving as output the true 
model of the system) than spectral techniques. Finally, it 
is to notice that the SLCA is the most stable within the 
considered filtering procedures. Such an excellent perfor- 
mance of hierarchical clustering techniques can be due to 
the degenerate nature of models as discussed in the first 
part of this section. 

In fact when we remove the degeneracy of coefficients 
from the models we observe a different behavior of filter- 
ing procedures. In Fig. [3] we plot (iir(Ci, Cf *)) versus 
(/'ir(Cf'*, C?'*)) for the artificial system obtained from a 
single realization Xx^ with time series length Td = 748 
of the factor model with 12 orthogonal factors. This is 
equivalent to consider a block model with non degener- 
ate coefficients. In Fig. 01 we plot results obtained for 
the single realization with length Td = 748 of time series 
of the HNFM with 23 factors. Also in this case our in- 
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FIG. 4: Hierarchically nested factor model with non degen- 
erate coefficients. Comparison of the filtered correlation ma- 
trices produced by the 4 techniques described in the text. 
The analyzed system is composed by 100 elements following 
the HNFM with 23 factors obtained in Ref. Averages 
have been performed over 1000 bootstrap replicas of a single 
realization of the system and error bars correspond to one 
standard deviation. 



FIG. 5: Correlation matrix of a real system composed of 100 
stocks traded at NYSE during the period from January 2001 
to December 2003. The variable investigated is daily return of 
the most capitalized stocks. The length of time series is T = 
748 for this system. Averages have been performed over 1000 
bootstrap replicas of data series and error bars correspond to 
one standard deviation. 



vestigation is equivalent to consider a HNFM with non 
degenerate coefficients. Mean values and error bars in the 
figures correspond to the average and the standard de- 
viation respectively both estimated over 1000 bootstrap 
replicas of the single realization of the models. From 
Figures [3] and [4] we note that 

(i^(Ci,cP)) ^ (i^(Ci,Cf)) < 
< (if(Ci,Cf-L^^)) < {K{Ci,Cf^^^)) 

In both the figures, we observe that none of the filtering 
procedures is more informative about the sample corre- 
lation matrix than the true correlation matrix S = Cx^ 
of both the models, i.e. E [K{C, S)] ~ 3.54 is smaller 
than any {K{Ci, C?'*)) reported in the figures. 

Concerning the stability of the filtered matrices, from 
the figures we observe that the SLCA filtered matrix out- 
performs all the other techniques, although the filtered 
matrix given by ALCA has a stability of the same or- 
der of magnitude of the SLCA matrix. A good filtered 
correlation matrix should be at least more stable than 
the sample correlation matrix with respect to the statis- 
tical uncertainty. This sentence can be translated in the 
following inequality 



(i^(Cf",C^"))<(if(Ci,Cj)). 



(14) 



For Gaussian variables we know the expected value of 
K{Ci,C^) from Eq. ^ and thus, for n = 100 and T = 
748, the last inequality becomes 



(i^(Cf",Cf*)) < 



1 n{n + 1) 



7.81. 



(15) 



This condition is satisfied by all the considered filtered 
matrices. However we stress the fact that the matrices 
obtained from hierarchical clustering techniques and in 
particular the one obtained by SLCA have a value of 
(i4r(C?'*, C?'*)) of an order of magnitude smaller than 
the one expected for the Pearson estimator of correla- 
tions. 

In summary, our investigation of considered models 
shows that spectral filtering techniques are slightly more 
informative about the sample correlation matrix than 
hierarchical clustering filtering techniques when details 
about specific correlation values are relevant, like in the 
case of non degenerate models. On the contrary, from 
the point of view of stability of filtered matrices, hierar- 
chical clustering procedures, and in particular the SLCA, 
outperform spectral techniques. 



C. Empirical data 

In this subsection, we compare the filtering procedures 
when applied to real data. We have considered the sys- 
tem of daily returns of the 100 most capitalized stocks 
traded at NYSE in the time period from January 2001 to 
December 2003. In this case, the length of the n = 100 
time series is T = 748 records. We have also consid- 
ered the system of daily returns of 92 highly capitalized 
stocks traded at London Stock Exchange in 2002. For 
this system the record length of the n — 92 time series is 
T = 250. 

In Fig. O we report the results obtained by apply- 
ing all the considered filtering procedures to the system 
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series, i.e. T 748 at NYSE and T = 250 at LSE. 
The smaller T, the larger is the statistical uncertainty 
of the sample correlation matrix. For instance, we can 
make quantitative this difference by using the expecta- 
tion values of the KuUback-Leibler distance of Eq.s ([6]) 
and ([7]). For a system of n 100 elements with data se- 
ries of length T = 748 we have E [/r(Ci, S)] ~ 3.54 and 
E [K{Ci, C2)] — 7.81, whereas for a system of n = 92 el- 
ements and series length T = 250 is E [K{Ci,i:)] ~ 9.86 
and E [i4'(Ci, C2)] — 27.2. A comparison of the results 
obtained for Gaussian random models in subsection IIVBI 
with the results obtained for the real systems investi- 
gated in this subsection shows that the KuUback-Leibler 
distance provides results on real data about the relative 
effectiveness of the considered filtering procedures which 
are in agreement with those observed for models. 



FIG. 6: Correlation matrix of a real system composed of 92 
stocks traded at LSE during the period from January to De- 
cember 2002. The variable investigated is daily return of the 
most capitalized stocks. The length of time series is T = 250 
for this system. Averages have been performed over 1000 
bootstrap replicas of data series and error bars correspond to 
one standard deviation. 



of n = 100 stocks traded at NYSE, while in Fig. [6] 
we show the results obtained for the system of n = 92 
stocks traded at LSE. In both the figures, we observe 
that hierarchical clustering procedures are more stable 
than spectral techniques, whereas the latter are more in- 
formative about the sample correlation matrix than hi- 
erarchical clustering. These facts are in agreement with 
results obtained for simulations in the case of non de- 
generate models. However this agreement is only qual- 
itative. Indeed, both the values of (A'(Ci, C?'*)) and 
(if (Cf C?'*)) observed for the real systems are larger 
than the corresponding values obtained in the case of 
simulations. This fact can be due to two effects. The 
first one is related to the fact that the real systems can be 
characterized by a structure of correlations more complex 
than the one considered in the models. For example, the 
role of the complexity of correlation structures onto the 
performance of filtering procedures was observed in the 
simulations of the degenerate models of subsection IIV Al 
for the spectral techniques. Indeed the performance of 
such procedures was rather unsatisfactory for the HNFM 
with respect to the block diagonal model. The second 
effect that can be responsible for the quantitative differ- 
ence between results obtained for simulations and results 
obtained for real data can be related to the fact that we 
have considered Gaussian variables in the simulations, 
whereas the distribution of returns is fat tailed [23 |. 

Some quantitative differences are also evident in the 
comparison of the two real systems. Specifically, both the 
values of (if (C;, Cf'*)) and (if(Cf C?'*)) are larger 
in the LSE data with respect to the NYSE data. This 
difference is mainly due to the different length of data 



V. CONCLUSIONS 

In conclusion we have shown that the KuUback-Leibler 
distance can be fruitfully used to compare correlation ma- 
trices of multivariate data. We have shown that this dis- 
tance is more appropriate to achieve this objective than 
the standard Frobenius distance. This fact is due to some 
properties of the KuUback-Leibler distance such as the 
asymmetry, the model independence of expectation val- 
ues and its relation with the maximum likelihood factor 
analysis. Sample correlation matrices can be compared 
in pairs among them and/or with respect to model ma- 
trices or to filtered matrices. We have used the KuUback- 
Leibler distance to compare four different techniques used 
to obtain a filtered correlation matrix from the empirical 
one. Two of the four techniques that we have analyzed 
are spectral filtering procedures based on random ma- 
trix theory whereas the other two techniques are based 
on hierarchical clustering methods, specifically ALGA 
and SLGA. Results obtained for simulations are consis- 
tent with those obtained for real data. These results 
can be summarized as follows: both the considered spec- 
tral techniques are slightly more informative about the 
sample correlation matrix than the other two techniques 
based on hierarchical clustering. On the other hand both 
the techniques based on hierarchical clustering are pro- 
ducing filtered correlation matrices which are more sta- 
ble than those obtained with spectral procedures. These 
results show that the KuUback-Leibler distance is very 
useful in characterizing multivariate systems described 
by real data, factor models and matrices filtered from 
the sample one. 

In conclusion, the KuUback-Leibler distance is a power- 
ful and accurate tool able to characterize the information 
and stability of sample, model and filtered correlation 
matrices and it is a useful quantitative indicator for the 
relative amount of information and the relative stability 
of correlation matrices of multivariate data. 
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VII. APPENDIX A 

In this Appendix, we show how to derive Eq. ([3]) from 
Eq. ([2]). Let us consider the multivariate Gaussian dis- 
tributions and P(S2,X) describing the same 
random vector X. We have 



, ^ cxp f-l-X^'Ei-^X 



(16) 



By substituting Eq. (fT6|) into Eq. ([2]) we get 



In this Appendix, we derive the expectation values of 
the KuUback-Leibler distance given in Eq.s (l5][7]). We 
shall use two known results from the theory of Wishart 
matrices. Let us consider a multinormally distributed 
random vector X of dimension n with correlation matrix 
S. Let Ci and C2 be two sample correlation matrices 
obtained from two independent realizations of the sys- 
tem, Xi and X2 respectively both of length T. The first 
result from the theory of Wishart matrices that we shall 
use hereafter is that log |Ci|, i = 1, 2 is equal to log |S| — 
n log(T) plus the sum of the logarithms of n mutually in- 
dependent chi-squared random variables yT-n+i-, ■■■iUt 
with degrees of freedom T — n + l,...,T— 1,T respec- 
tively (see for instance [9]). This fact implies that the 
expectation value of log |Ci| is 



X(P(Si,X),P(S2,X)) = llog(^^ 



1 



2^(2^)" |Sj 



where 



£;(log|Ci|)-log|S|-nlog(T)-f ^ E[logiyp)]. 

p=T-n+l 

(17) (21) 
Because E^g{yp)] = r'(p/2)/r(p/2) -Mog(2) (see for 

instance [25|) we obtain that: 



X'-Sf^X) dX. 



(18) 



The integral lij can be solved by using the linear trans- 
formation Y — GjX, where Gj is the orthogonal matrix 
which diagonalizes Sj. It results that 



h,, = x/(27r)«|Si|^/i,,6, 



99' 



(19) 



£;(log|Ci|)-log|S|+nlog(2/T)+ ^ 



r'(p/2) 



^ r(p/2) " 

B=T-n+l ' 

(22) 

The other result from the theory of Wishart matrices 
that we use here is that the expectation value of the 
inverse of Ci is E{CC^) = TS-V(T - n - 1) (see for 
instance [9]). Accordingly we obtain: 



where hqq {q — 1, n) are the elements of the diagonal 
matrix Gj^^Sj^^Gj, whereas bqq (q = l,...,n) are the 
diagonal elements of the matrix Gj^^SiGj. We can fur- 
ther simplify the expression of lij by taking into account 
the fact that the matrix Gj^Sj^^Gj is diagonal. Indeed 



So=l hqqbqq 



-"J "J 

^g=i --m-m ^A^i GI-jGj SiGj] = tr[Sj Sj] 

due to the orthogonality of Gj and to the invariance of 
the trace with respect to rotations. Accordingly, we ob- 
tain that 



E [tr (Ci-^S)] = E [tr (Ci-^Cj)] = 



nT 



T-n-1 



, (23) 



where we have used the linearity of the trace operator. 
Finally, we have: 



E [tr(S-iCj)] = tr (S-^S) 



(24) 



(20) 



Finally, we obtain the expression of 
K{P{-Ei,X),P{-E2,X)) given in Eq. © by sub- 
stituting the last expression of lij into Eq. p7)) and 
noting that tr[I]i^"'^Si] = n. 



where we have again used the linearity of the trace and 
the fact that E{Ci) = S. By using Eq.s ([22]) and ^ 
it is now straightforward to obtain both the expression 
of E [KCE, Ci)] as given in Eq. (O and the expectation 
value E[K{Ci,C2)] as given in Eq. Q. Finally, by 
using results of Eq.s and ([M)) we obtain the expec- 
tation value of K{Cx, S) as given in Eq. ([5]). 
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